AITopics | data-aware low-rank compression

DRONE: Data-aware Low-rank Compression for Large NLP Models

Neural Information Processing SystemsDec-25-2025, 06:33:05 GMT

The representations learned by large-scale NLP models such as BERT have been widely used in various tasks. However, the increasing model size of the pre-trained models also brings efficiency challenges, including inference speed and model size when deploying models on mobile devices. Specifically, most operations in BERT consist of matrix multiplications. These matrices are not low-rank and thus canonical matrix decomposition could not find an efficient approximation. In this paper, we observe that the learned representation of each layer lies in a low-dimensional space.

data-aware low-rank compression, drone, name change, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.42)

Add feedback

Appendix for: Data-Aware Low-Rank Compression for Large NLP Models A Proof of Theorem 1 Theorem 1

Neural Information Processing SystemsAug-18-2025, 22:00:53 GMT

In addition, a pre-defined search grid is also necessary. With these input parameters, we firstly distribute the total allowed loss into each individual module. First, it's indeed a trade-off between the efficiency and efficacy as the speedup ratio goes higher at the cost of lower Thus, in the real application, users need to decide what's the best We could have chose another cutoff like 1 % accuracy with lower speedup ratio to report, but this won't help too much when comparing different baseline methods. D.1 LSTM result A 2-layer LSTM model is composed of two large matrices layers and one large softmax layer. Thus, despite the matrix is much smaller and well approximated by DRONE, the overall acceleration on GPU is less.

artificial intelligence, inference time, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

DRONE: Data-aware Low-rank Compression for Large NLP Models

Neural Information Processing SystemsMay-27-2025, 06:54:27 GMT

The representations learned by large-scale NLP models such as BERT have been widely used in various tasks. However, the increasing model size of the pre-trained models also brings efficiency challenges, including inference speed and model size when deploying models on mobile devices. Specifically, most operations in BERT consist of matrix multiplications. These matrices are not low-rank and thus canonical matrix decomposition could not find an efficient approximation. In this paper, we observe that the learned representation of each layer lies in a low-dimensional space.

data-aware low-rank compression, drone, model size, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.78)

Add feedback

DRONE: Data-aware Low-rank Compression for Large NLP Models

Neural Information Processing SystemsJan-19-2025, 14:06:06 GMT

The representations learned by large-scale NLP models such as BERT have been widely used in various tasks. However, the increasing model size of the pre-trained models also brings efficiency challenges, including inference speed and model size when deploying models on mobile devices. Specifically, most operations in BERT consist of matrix multiplications. These matrices are not low-rank and thus canonical matrix decomposition could not find an efficient approximation. In this paper, we observe that the learned representation of each layer lies in a low-dimensional space.

data-aware low-rank compression, drone, model size, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.78)

Add feedback

Filters

Collaborating Authors

data-aware low-rank compression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DRONE: Data-aware Low-rank Compression for Large NLP Models

Appendix for: Data-Aware Low-Rank Compression for Large NLP Models A Proof of Theorem 1 Theorem 1

DRONE: Data-aware Low-rank Compression for Large NLP Models

DRONE: Data-aware Low-rank Compression for Large NLP Models